Search Results

Documents authored by Schmidt, Sebastian


Document
Cut Paths and Their Remainder Structure, with Applications

Authors: Massimo Cairo, Shahbaz Khan, Romeo Rizzi, Sebastian Schmidt, Alexandru I. Tomescu, and Elia C. Zirondelli

Published in: LIPIcs, Volume 254, 40th International Symposium on Theoretical Aspects of Computer Science (STACS 2023)


Abstract
In a strongly connected graph G = (V,E), a cut arc (also called strong bridge) is an arc e ∈ E whose removal makes the graph no longer strongly connected. Equivalently, there exist u,v ∈ V, such that all u-v walks contain e. Cut arcs are a fundamental graph-theoretic notion, with countless applications, especially in reachability problems. In this paper we initiate the study of cut paths, as a generalisation of cut arcs, which we naturally define as those paths P for which there exist u,v ∈ V, such that all u-v walks contain P as subwalk. We first prove various properties of cut paths and define their remainder structures, which we use to present a simple O(m)-time verification algorithm for a cut path (|V| = n, |E| = m). Secondly, we apply cut paths and their remainder structures to improve several reachability problems from bioinformatics, as follows. A walk is called safe if it is a subwalk of every node-covering closed walk of a strongly connected graph. Multi-safety is defined analogously, by considering node-covering sets of closed walks instead. We show that cut paths provide simple O(m)-time algorithms verifying if a walk is safe or multi-safe. For multi-safety, we present the first linear time algorithm, while for safety, we present a simple algorithm where the state-of-the-art employed complex data structures. Finally we show that the simultaneous computation of remainder structures of all subwalks of a cut path can be performed in linear time, since they are related in a structured way. These properties yield an O(mn)-time algorithm outputting all maximal multi-safe walks, improving over the state-of-the-art algorithm running in time O(m²+n³). The results of this paper only scratch the surface in the study of cut paths, and we believe a rich structure of a graph can be revealed, considering the perspective of a path, instead of just an arc.

Cite as

Massimo Cairo, Shahbaz Khan, Romeo Rizzi, Sebastian Schmidt, Alexandru I. Tomescu, and Elia C. Zirondelli. Cut Paths and Their Remainder Structure, with Applications. In 40th International Symposium on Theoretical Aspects of Computer Science (STACS 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 254, pp. 17:1-17:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{cairo_et_al:LIPIcs.STACS.2023.17,
  author =	{Cairo, Massimo and Khan, Shahbaz and Rizzi, Romeo and Schmidt, Sebastian and Tomescu, Alexandru I. and Zirondelli, Elia C.},
  title =	{{Cut Paths and Their Remainder Structure, with Applications}},
  booktitle =	{40th International Symposium on Theoretical Aspects of Computer Science (STACS 2023)},
  pages =	{17:1--17:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-266-2},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{254},
  editor =	{Berenbrink, Petra and Bouyer, Patricia and Dawar, Anuj and Kant\'{e}, Mamadou Moustapha},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2023.17},
  URN =		{urn:nbn:de:0030-drops-176690},
  doi =		{10.4230/LIPIcs.STACS.2023.17},
  annote =	{Keywords: reachability, cut arc, strong bridge, covering walk, safety, persistence, essentiality, genome assembly}
}
Document
Eulertigs: Minimum Plain Text Representation of k-mer Sets Without Repetitions in Linear Time

Authors: Sebastian Schmidt and Jarno N. Alanko

Published in: LIPIcs, Volume 242, 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)


Abstract
A fundamental operation in computational genomics is to reduce the input sequences to their constituent k-mers. For maximum performance of downstream applications it is important to store the k-mers in small space, while keeping the representation easy and efficient to use (i.e. without k-mer repetitions and in plain text). Recently, heuristics were presented to compute a near-minimum such representation. We present an algorithm to compute a minimum representation in optimal (linear) time and use it to evaluate the existing heuristics. For that, we present a formalisation of arc-centric bidirected de Bruijn graphs and carefully prove that it accurately models the k-mer spectrum of the input. Our algorithm first constructs the de Bruijn graph in linear time in the length of the input strings (for a fixed-size alphabet). Then it uses a Eulerian-cycle-based algorithm to compute the minimum representation, in time linear in the size of the output.

Cite as

Sebastian Schmidt and Jarno N. Alanko. Eulertigs: Minimum Plain Text Representation of k-mer Sets Without Repetitions in Linear Time. In 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 242, pp. 2:1-2:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{schmidt_et_al:LIPIcs.WABI.2022.2,
  author =	{Schmidt, Sebastian and Alanko, Jarno N.},
  title =	{{Eulertigs: Minimum Plain Text Representation of k-mer Sets Without Repetitions in Linear Time}},
  booktitle =	{22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)},
  pages =	{2:1--2:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-243-3},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{242},
  editor =	{Boucher, Christina and Rahmann, Sven},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2022.2},
  URN =		{urn:nbn:de:0030-drops-170361},
  doi =		{10.4230/LIPIcs.WABI.2022.2},
  annote =	{Keywords: Spectrum preserving string sets, Eulerian cycle, Suffix tree, Bidirected arc-centric de Bruijn graph, k-mer based methods}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail